Weakly-supervised object detection (WSOD) models attempt to leverage image-level annotations in lieu of accurate but costly-to-obtain object localization labels. This oftentimes leads to substandard object detection and localization at inference time. To tackle this issue, we propose D2DF2WOD, a Dual-Domain Fully-to-Weakly Supervised Object Detection framework that leverages synthetic data, annotated with precise object localization, to supplement a natural image target domain, where only image-level labels are available. In its warm-up domain adaptation stage, the model learns a fully-supervised object detector (FSOD) to improve the precision of the object proposals in the target domain, and at the same time learns target-domain-specific and detection-aware proposal features. In its main WSOD stage, a WSOD model is specifically tuned to the target domain. The feature extractor and the object proposal generator of the WSOD model are built upon the fine-tuned FSOD model. We test D2DF2WOD on five dual-domain image benchmarks. The results show that our method results in consistently improved object detection and localization compared with state-of-the-art methods.
translated by 谷歌翻译
可靠的结肠镜检查图像自动分类对于评估结肠病变阶段和制定适当的治疗计划具有重要意义。但是,由于亮度不平,位置可变性,类间的相似性和类内部差异,它影响了分类精度,因此具有挑战性。为了解决上述问题,我们在本研究中提出了一个基于傅立叶的频率复杂网络(FFCNET),用于结肠疾病分类。具体而言,FFCNET是一个新颖的复杂网络,可以使复杂的卷积网络与频率学习的结合,以克服由实际卷积操作引起的相位信息丢失。同样,我们的傅立叶变换会将图像的平均亮度传递到频谱中的一个点(DC组件)中,从而通过解耦图像含量和亮度来减轻亮度不均匀的影响。此外,FFCNET中的图像贴片争夺模块会生成随机的局部光谱块,使网络能够学习长期和局部疾病特定特征,并提高硬样品的判别能力。我们在具有2568个结肠镜检查图像的内部数据集上评估了所提出的FFCNET,这表明我们的方法实现了高性能的表现优于先前的最新方法,其准确性为86:35%,准确性高4.46%,高4.46%。具有代码的项目页面可在https://github.com/soleilssss/ffcnet上找到。
translated by 谷歌翻译
该技术报告提出了一种有效的自动驾驶运动预测方法。我们开发了一种基于变压器的方法,用于输入编码和轨迹预测。此外,我们提出了时间流动头来增强轨迹编码。最后,使用了有效的K均值集合方法。使用我们的变压器网络和集合方法,我们以1.90的最新Brier-Minfde得分赢得了Argoverse 2 Motion预测挑战的第一名。
translated by 谷歌翻译
本文介绍了Yidun Nisp团队向视频关键字唤醒挑战提交的系统。我们提出了一个普通话关键字发现系统(KWS),具有几种新颖且有效的改进,包括大骨干(B)模型,一个关键字偏置(B)机制和版本建模单元的引入。通过考虑一下,我们将总系统BBS-KWS作为缩写。 BBS-KWS系统由端到端的自动语音识别(ASR)模块和KWS模块组成。 ASR模块将语音特征转换为文本表示,文本表示将大骨干网络应用于声学模型,并考虑了音节建模单元。另外,关键字偏置机制用于改善ASR推断阶段中的关键字的召回率。 KWS模块应用多个标准,以确定关键字的缺席或存在,例如多级匹配,模糊匹配和连接主义时间分类(CTC)前缀分数。为了进一步改进我们的系统,我们对CN-Celeb数据集进行半监督学习,以获得更好的概括。在VKW任务中,BBS-KWS系统实现了基线的显着收益,并在两条轨道中获得了第一名。
translated by 谷歌翻译
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
translated by 谷歌翻译
Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data.
translated by 谷歌翻译
We represent the ResNeRF, a novel geometry-guided two-stage framework for indoor scene novel view synthesis. Be aware of that a good geometry would greatly boost the performance of novel view synthesis, and to avoid the geometry ambiguity issue, we propose to characterize the density distribution of the scene based on a base density estimated from scene geometry and a residual density parameterized by the geometry. In the first stage, we focus on geometry reconstruction based on SDF representation, which would lead to a good geometry surface of the scene and also a sharp density. In the second stage, the residual density is learned based on the SDF learned in the first stage for encoding more details about the appearance. In this way, our method can better learn the density distribution with the geometry prior for high-fidelity novel view synthesis while preserving the 3D structures. Experiments on large-scale indoor scenes with many less-observed and textureless areas show that with the good 3D surface, our method achieves state-of-the-art performance for novel view synthesis.
translated by 谷歌翻译
人体运动的实时跟踪对于AR/VR中的互动和沉浸式体验至关重要。但是,有关人体的传感器数据非常有限,可以从独立的可穿戴设备(例如HMD(头部安装设备)或AR眼镜)获得。在这项工作中,我们提出了一个强化学习框架,该框架从HMD和两个控制器中获取稀疏信号,并模拟合理且身体上有效的全身运动。在训练过程中,使用高质量的全身运动作为密集的监督,一个简单的策略网络可以学会为角色,步行和慢跑的角色输出适当的扭矩,同时紧随输入信号。我们的结果表明,即使输入仅是HMD的6D变换,也没有对下半身进行任何观察到的地面真理的惊人相似的腿部运动。我们还表明,单一政策可以对各种运动风格,不同的身体尺寸和新颖的环境都有坚固的态度。
translated by 谷歌翻译
本文涉及两人零和马尔可夫游戏 - 可以说是多代理增强学习中最基本的设置 - 目的是学习纳什平衡(NE)的样本 - 优越。所有先前的结果至少都有两个障碍中的至少一个:多种试剂的诅咒和长层的障碍,无论使用采样方案如何。假设访问灵活的采样机制:生成模型,我们朝着解决此问题迈出了一步。专注于非平稳的有限 - 霍森马尔可夫游戏,我们开发了一种学习算法$ \ mathsf {nash} \ text { - } \ mathsf {q} \ text { - } \ text { - } \ mathsf {ftrl} $ and deflavery and Adaptive采样方案对抗性学习中的乐观原则(尤其是跟随规范化领导者(FTRL)方法),具有精致的奖励术语设计,可确保在FTRL动力学下进行某些可分解性。我们的算法使用$$ \ widetilde {o} \ bigg(\ frac {h^4 s(a+b)} {\ varepsilon^2} \ bigg)$ bigg)$ samples $ \ varepsilon $ -Approximate Markov ne策略其中$ s $是状态的数量,$ h $是地平线,而$ a $ a $ a $ a $ a $(resp。〜 $ b $)表示max-player的动作数(分别〜min-player)。从最小的意义上讲,这几乎无法得到解决。在此过程中,我们得出了一个精致的遗憾,以赋予FTRL的遗憾,从而明确说明了差异数量的作用,这可能具有独立的利益。
translated by 谷歌翻译
缺少数据是数据驱动的智能运输系统(ITS)中不可避免且常见的问题。在过去的十年中,学者们对丢失的流量数据的恢复进行了许多研究,但是如何充分利用时空交通模式以改善恢复性能仍然是一个开放的问题。针对流量速度数据的时空特征,本文将缺失数据的恢复视为矩阵完成问题,并根据隐藏的功能分析提出了一种时空的交通数据完成方法,该方法发现时空模式和基础模式从不完整数据的结构完成恢复任务。因此,我们引入空间和时间相关性,以捕获每个维度的主要基础特征。最后,这些潜在功能通过潜在功能分析应用于恢复流量数据。实验和评估结果表明,模型的评估标准值很小,这表明该模型具有更好的性能。结果表明该模型可以准确估计连续缺少的数据。
translated by 谷歌翻译